ObjectivesΒΆ

-Preprocess, analyze, detect and treat several anomalies using multiple techniques on the battery charge/discharge dataset obtained from NASA.

Data DictionaryΒΆ

  • Voltage_measured: Fuel Cell terminal voltage (Volts)
  • Current_measured: Fuel Cell output current (Amps)
  • Temperature_measured: Fuel Cell temperature (degree C)
  • Current_load: Current measured at load (Amps)
  • Voltage_load: Voltage measured at load (Volts)
  • Time: Time vector for the cycle (seconds)
  • Capacity: Fuel Cell capacity (Ahr) for discharge till 2.7V
  • Datetime: the date and time of the start of the cycle
  • Ambient_temperature: Temperature in which the fuel cell is stored (degree C)
  • Cycle: Charge cycle of the fuel cell

Data DescriptionΒΆ

  • A set of four Fuel cells (# 5, 6, 7 and 18) were run through 2 different operational profiles (charge& discharge) at room temperature. Charging was carried out in a constant current (CC) mode at 1.5A until the Fuel cell voltage reached 4.2V and then continued in a constant voltage (CV) mode until the charge current dropped to 20mA. Discharge was carried out at a constant current (CC) level of 2A until the battery voltage fell to 2.7V, 2.5V, 2.2V and 2.5V for Fuel cell 5 6 7 and 18 respectively.

  • Repeated charge and discharge cycles result in accelerated aging of the Fuel cell. The experiments were stopped when the Fuel cell reached end-of-life (EOL) criteria, which was a 30% fade in rated capacity (from 2Ahr to 1.4Ahr).

  • This data can be further used for the prediction of both remaining charge (for a given discharge cycle) and remaining useful life (RUL).

InΒ [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
from scipy.io import loadmat
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler, MinMaxScaler
import warnings
import os

warnings.filterwarnings('ignore')
InΒ [2]:
def load_data(file, battery):
    mat = loadmat(file)
    print('Total data in dataset:', len(mat[battery][0, 0]['cycle'][0]))
    counter = 0
    dataset, capacity_data = [], []

    for i in range(len(mat[battery][0, 0]['cycle'][0])):
        row = mat[battery][0, 0]['cycle'][0, i]
        if row['type'][0] == 'discharge':
            ambient_temperature = row['ambient_temperature'][0][0]
            date_time = datetime.datetime(
                int(row['time'][0][0]), int(row['time'][0][1]),
                int(row['time'][0][2]), int(row['time'][0][3]),
                int(row['time'][0][4])) + datetime.timedelta(seconds=int(row['time'][0][5]))
            data = row['data']
            capacity = data[0][0]['Capacity'][0][0]
            for j in range(len(data[0][0]['Voltage_measured'][0])):
                voltage_measured = data[0][0]['Voltage_measured'][0][j]
                current_measured = data[0][0]['Current_measured'][0][j]
                temperature_measured = data[0][0]['Temperature_measured'][0][j]
                current_load = data[0][0]['Current_load'][0][j]
                voltage_load = data[0][0]['Voltage_load'][0][j]
                time = data[0][0]['Time'][0][j]
                dataset.append([
                    counter+1, ambient_temperature, date_time, capacity,
                    voltage_measured, current_measured, temperature_measured,
                    current_load, voltage_load, time
                ])
                capacity_data.append([counter+1, ambient_temperature, date_time, capacity])
                counter += 1
    return [
        pd.DataFrame(data=dataset, columns=[
            'cycle', 'ambient_temperature', 'datetime', 'capacity', 'voltage_measured',
            'current_measured', 'temperature_measured', 'current_load', 'voltage_load', 'time'
        ]),
        pd.DataFrame(data=capacity_data, columns=[
            'cycle', 'ambient_temperature', 'datetime', 'capacity'])
    ]
InΒ [3]:
B0005_dataset, B005_capacity = load_data('battery_dataset/1. BatteryAgingARC-FY08Q4/B0005.mat', 'B0005')
B0005_dataset['flag'] = 1

B0006_dataset, B006_capacity = load_data('battery_dataset/1. BatteryAgingARC-FY08Q4/B0006.mat', 'B0006')
B0006_dataset['flag'] = 2

B0007_dataset, B007_capacity = load_data('battery_dataset/1. BatteryAgingARC-FY08Q4/B0007.mat', 'B0007')
B0007_dataset['flag'] = 3

B0018_dataset, B0018_capacity = load_data('battery_dataset/1. BatteryAgingARC-FY08Q4/B0018.mat', 'B0018')
B0018_dataset['flag'] = 4
Total data in dataset: 616
Total data in dataset: 616
Total data in dataset: 616
Total data in dataset: 319
InΒ [4]:
fuel_cells_df = pd.concat([B0005_dataset, B0006_dataset, B0007_dataset, B0018_dataset],
                          ignore_index=True)

fuel_cells_df.head(10)
Out[4]:
cycle ambient_temperature datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag
0 1 24 2008-04-02 15:25:41 1.856487 4.191492 -0.004902 24.330034 -0.0006 0.000 0.000 1
1 2 24 2008-04-02 15:25:41 1.856487 4.190749 -0.001478 24.325993 -0.0006 4.206 16.781 1
2 3 24 2008-04-02 15:25:41 1.856487 3.974871 -2.012528 24.389085 -1.9982 3.062 35.703 1
3 4 24 2008-04-02 15:25:41 1.856487 3.951717 -2.013979 24.544752 -1.9982 3.030 53.781 1
4 5 24 2008-04-02 15:25:41 1.856487 3.934352 -2.011144 24.731385 -1.9982 3.011 71.922 1
5 6 24 2008-04-02 15:25:41 1.856487 3.920058 -2.013007 24.909816 -1.9982 2.991 90.094 1
6 7 24 2008-04-02 15:25:41 1.856487 3.907904 -2.014400 25.105884 -1.9982 2.977 108.281 1
7 8 24 2008-04-02 15:25:41 1.856487 3.897036 -2.011603 25.317019 -1.9982 2.967 126.453 1
8 9 24 2008-04-02 15:25:41 1.856487 3.887477 -2.018015 25.509423 -1.9982 2.959 144.641 1
9 10 24 2008-04-02 15:25:41 1.856487 3.878959 -2.013135 25.703603 -1.9982 2.951 162.844 1
InΒ [5]:
fuel_cells_df.shape
Out[5]:
(185721, 11)
InΒ [6]:
fuel_cells_df.columns
Out[6]:
Index(['cycle', 'ambient_temperature', 'datetime', 'capacity',
       'voltage_measured', 'current_measured', 'temperature_measured',
       'current_load', 'voltage_load', 'time', 'flag'],
      dtype='object')
InΒ [7]:
fuel_cells_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 185721 entries, 0 to 185720
Data columns (total 11 columns):
 #   Column                Non-Null Count   Dtype         
---  ------                --------------   -----         
 0   cycle                 185721 non-null  int64         
 1   ambient_temperature   185721 non-null  uint8         
 2   datetime              185721 non-null  datetime64[ns]
 3   capacity              185721 non-null  float64       
 4   voltage_measured      185721 non-null  float64       
 5   current_measured      185721 non-null  float64       
 6   temperature_measured  185721 non-null  float64       
 7   current_load          185721 non-null  float64       
 8   voltage_load          185721 non-null  float64       
 9   time                  185721 non-null  float64       
 10  flag                  185721 non-null  int64         
dtypes: datetime64[ns](1), float64(7), int64(2), uint8(1)
memory usage: 14.3 MB
InΒ [8]:
for i in fuel_cells_df.columns:
    print(i, ':', fuel_cells_df[i].nunique())
cycle : 50285
ambient_temperature : 1
datetime : 300
capacity : 636
voltage_measured : 185721
current_measured : 185721
temperature_measured : 185721
current_load : 21
voltage_load : 1835
time : 62016
flag : 4
InΒ [9]:
fuel_cells_df.drop('ambient_temperature', axis=1, inplace=True)
InΒ [10]:
fuel_cells_df.isnull().sum()
Out[10]:
cycle                   0
datetime                0
capacity                0
voltage_measured        0
current_measured        0
temperature_measured    0
current_load            0
voltage_load            0
time                    0
flag                    0
dtype: int64

Univariate AnalysisΒΆ

InΒ [11]:
def numerical_feature(df, colname, nrows=2, ncols=2, width=15, height=70):
    fig, ax = plt.subplots(nrows, ncols, figsize=(width, height))
    for rows, var in enumerate(colname):
        ax[rows][0].set_title('Boxplot for Outlier Detection', fontweight='bold')
        plt.ylabel(var, fontsize=12)
        sns.boxplot(y=df[var], color='crimson', ax=ax[rows][0])
        sns.histplot(df[var], color='purple', ax=ax[rows][1], kde=True)
        ax[rows][1].axvline(df[var].mean(), color='r', linestyle='--', label='Mean')
        ax[rows][1].axvline(df[var].median(), color='m', linestyle='-', label='Median')
        ax[rows][1].axvline(df[var].mode()[0], color='b', linestyle='-', label='Mode')
        ax[rows][1].set_title('Outlier Detection', fontweight='bold')
        ax[rows][1].legend({'Mean': df[var].mean(), 'Median': df[var].median(), 'Mode': df[var].mode()})
    
    plt.show()
InΒ [12]:
fuel_cells_df.columns
Out[12]:
Index(['cycle', 'datetime', 'capacity', 'voltage_measured', 'current_measured',
       'temperature_measured', 'current_load', 'voltage_load', 'time', 'flag'],
      dtype='object')
InΒ [13]:
numerical_cols = ['capacity', 'voltage_load']

numerical_feature(fuel_cells_df, numerical_cols, len(numerical_cols), 2, 15, 10)
No description has been provided for this image
InΒ [14]:
categorical_cols = ['flag']

plt.figure(figsize=(7, 8))
count = fuel_cells_df['flag'].value_counts(normalize=True) * 100
count.plot.pie(colors=sns.color_palette('bright'), autopct='%0.0f%%',
               textprops={'fontsize': 14}, shadow=True)
plt.show()
No description has been provided for this image

Bivariate AnalysisΒΆ

  1. Capacity with respect to Cycle
InΒ [15]:
plt.figure(figsize=(15, 7))
sns.lineplot(x='cycle', y='capacity', data=fuel_cells_df, 
             palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show()
No description has been provided for this image
  • As the cycle increases, the battery capacities is showing a decreasing trend
  • Capacity of batteries decrease as number of cycles of battery usage increase
  1. Datetime with respect to time
InΒ [16]:
plt.figure(figsize=(15, 4))
sns.scatterplot(x='datetime', y='time', data=fuel_cells_df, 
                palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show()
No description has been provided for this image
  • There has not been continuous testing of charging of batteries throughout the cycles.
  • First 3 batteries (5, 6, 7) have been charged/tested synchronously whereas the 4th battery (B0018) is charged/tested post July 2008 with a couple of breaks between them
  • The general trend is more or less the same which is as time of battery's cycle of charging decreases over a period of time.
  1. Datetime with respect to capacity
InΒ [17]:
plt.figure(figsize=(15, 7))
sns.scatterplot(x='datetime', y='capacity', data=fuel_cells_df, 
                palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show()
No description has been provided for this image
  • Capacity decreases over a period of time.
  • We can clearly see that, Battery (B0018) has been launched recently and is having average capacity over time.
  • Capacity of Battery B0006 deteriorates the most over time whereas for B0007 it's most efficient in terms of capacity over a period of time.
  1. Capacity with respect to time
InΒ [18]:
plt.figure(figsize=(15, 7))
sns.lineplot(x='capacity', y='time', data=fuel_cells_df, 
                palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show()
No description has been provided for this image
  • As the time to charge the batteries is increasing, the capacity is also increasing for all the batteries
  1. Temperature with respect to Time
InΒ [19]:
plt.figure(figsize=(15, 7))
sns.lineplot(x='temperature_measured', y='time', data=fuel_cells_df, 
                palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show()
No description has been provided for this image
  • As the time increases for batteries, the depreciation increases because, temperature_measured increases in every cycle of charging.
  • As clearly visible, Battery B0006 has maximum number of outliers
  1. Voltage measured with respect to time
InΒ [20]:
plt.figure(figsize=(15, 7))
sns.lineplot(x='voltage_measured', y='time', data=fuel_cells_df, 
                palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show()
No description has been provided for this image
  • Battery B0006 has maximum number of outliers
  1. Current measured with respect to time
InΒ [21]:
plt.figure(figsize=(15, 7))
sns.barplot(x='current_load', y='time', data=fuel_cells_df, 
                palette=['r', 'g', 'b', 'y'], hue='flag')
plt.show();
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[21], line 2
      1 plt.figure(figsize=(15, 7))
----> 2 sns.barplot(x='current_load', y='time', data=fuel_cells_df, 
      3                 palette=['r', 'g', 'b', 'y'], hue='flag')
      4 plt.show();

File ~/mambaforge/lib/python3.10/site-packages/seaborn/categorical.py:2763, in barplot(data, x, y, hue, order, hue_order, estimator, errorbar, n_boot, units, seed, orient, color, palette, saturation, width, errcolor, errwidth, capsize, dodge, ci, ax, **kwargs)
   2760 if ax is None:
   2761     ax = plt.gca()
-> 2763 plotter.plot(ax, kwargs)
   2764 return ax

File ~/mambaforge/lib/python3.10/site-packages/seaborn/categorical.py:1587, in _BarPlotter.plot(self, ax, bar_kws)
   1585 """Make the plot."""
   1586 self.draw_bars(ax, bar_kws)
-> 1587 self.annotate_axes(ax)
   1588 if self.orient == "h":
   1589     ax.invert_yaxis()

File ~/mambaforge/lib/python3.10/site-packages/seaborn/categorical.py:767, in _CategoricalPlotter.annotate_axes(self, ax)
    764     ax.set_ylim(-.5, len(self.plot_data) - .5, auto=None)
    766 if self.hue_names is not None:
--> 767     ax.legend(loc="best", title=self.hue_title)

File ~/.local/lib/python3.10/site-packages/matplotlib/axes/_axes.py:322, in Axes.legend(self, *args, **kwargs)
    204 @_docstring.dedent_interpd
    205 def legend(self, *args, **kwargs):
    206     """
    207     Place a legend on the Axes.
    208 
   (...)
    320     .. plot:: gallery/text_labels_and_annotations/legend.py
    321     """
--> 322     handles, labels, kwargs = mlegend._parse_legend_args([self], *args, **kwargs)
    323     self.legend_ = mlegend.Legend(self, handles, labels, **kwargs)
    324     self.legend_._remove_method = self._remove_legend

File ~/.local/lib/python3.10/site-packages/matplotlib/legend.py:1361, in _parse_legend_args(axs, handles, labels, *args, **kwargs)
   1357     handles = [handle for handle, label
   1358                in zip(_get_legend_handles(axs, handlers), labels)]
   1360 elif len(args) == 0:  # 0 args: automatically detect labels and handles.
-> 1361     handles, labels = _get_legend_handles_labels(axs, handlers)
   1362     if not handles:
   1363         log.warning(
   1364             "No artists with labels found to put in legend.  Note that "
   1365             "artists whose label start with an underscore are ignored "
   1366             "when legend() is called with no argument.")

File ~/.local/lib/python3.10/site-packages/matplotlib/legend.py:1291, in _get_legend_handles_labels(axs, legend_handler_map)
   1289 for handle in _get_legend_handles(axs, legend_handler_map):
   1290     label = handle.get_label()
-> 1291     if label and not label.startswith('_'):
   1292         handles.append(handle)
   1293         labels.append(label)

AttributeError: 'numpy.int64' object has no attribute 'startswith'
No description has been provided for this image
  • Battery B0006 has maximum number of outliers
  1. Correlation between various features
InΒ [22]:
fig, ax = plt.subplots(figsize = (20,10))
sns.heatmap(fuel_cells_df.corr(method='spearman'), annot=True)
plt.show()
No description has been provided for this image
  • cycle & capacity -> negatively correlated.
  • volatage load and voltage_measured -> negatively correlated.
  • voltage_measured and time -> positively correlated.
  • temperature and time -> positively correlated.

Visual analysis using traditional methods of anomaly detectionΒΆ

  1. IQR
  2. DBSCAN

IQR-based anomaly detectionΒΆ

InΒ [23]:
num_cols_1 = ['capacity', 'voltage_measured', 'voltage_load']
num_cols_2 = ['current_measured', 'current_load']
temp = ['temperature_measured']
num_cols_3 = ['cycle', 'time']
InΒ [24]:
Q1 = fuel_cells_df.quantile(0.25)
Q3 = fuel_cells_df.quantile(0.75)

IQR = Q3 - Q1
print(IQR)
cycle                            23215.0
datetime                23 days 02:10:03
capacity                        0.315825
voltage_measured                0.278098
current_measured                0.021444
temperature_measured            5.850056
current_load                      0.0008
voltage_load                       0.308
time                            1542.688
flag                                 2.0
dtype: object
InΒ [25]:
fuel_cells_df[num_cols_1].boxplot(figsize=(10, 5))
plt.show()
No description has been provided for this image
  • voltage_measured, voltage_load have more number of outliers below the lower whisker
InΒ [26]:
fuel_cells_df[num_cols_2].boxplot(figsize=(10, 5))
plt.show()
No description has been provided for this image
  • As the dataponts are less, the IQR range is very small, but some outliers are present for current_measured, current_load
InΒ [27]:
fuel_cells_df[num_cols_3].boxplot(figsize=(10, 5))
plt.show()
No description has been provided for this image
  • All other features are having very less outliers
InΒ [28]:
num_cols = ['cycle', 'capacity', 'voltage_measured', 'current_measured',
                  'temperature_measured', 'current_load', 'voltage_load', 'time']

fuel_cells_df1 = fuel_cells_df.copy()

Outliers treatment using IQR-based methods

InΒ [29]:
fuel_cells_df1 = fuel_cells_df1[~((fuel_cells_df1[num_cols] < (Q1 - 1.5*IQR)[num_cols]) | (fuel_cells_df1[num_cols] > (Q3 +1.5*IQR)[num_cols])).any(axis = 1)]
fuel_cells_df1 = fuel_cells_df1.reset_index(drop = True)
InΒ [30]:
fuel_cells_df1[num_cols_1].boxplot(figsize = (10,5))
plt.show()
No description has been provided for this image
InΒ [31]:
fuel_cells_df1[num_cols_2].boxplot(figsize = (10,3))
plt.show()
No description has been provided for this image
InΒ [32]:
fuel_cells_df1[temp].boxplot(figsize = (10,3))
plt.show()
No description has been provided for this image
InΒ [33]:
fuel_cells_df1[num_cols_3].boxplot(figsize = (10,5))
plt.show()
No description has been provided for this image
InΒ [34]:
fuel_cells_df1.shape[0] - fuel_cells_df.shape[0]
Out[34]:
-37766
  • As we can see, total 37,766 datapoints are eliminated using IQR based method.
  • This might contain some useful data as well, so it's better to treat/ detect outliers with some robust and SOTA methods of anomaly detection
InΒ [35]:
fuel_cells_df_copy1 = fuel_cells_df.copy()
InΒ [36]:
fuel_cells_df['datetime'] = fuel_cells_df['datetime'].astype('int64')
fuel_cells_df['datetime'].head()

fuel_cells_df_copy2 = fuel_cells_df.copy()

DBSCAN-based anomaly detectionΒΆ

InΒ [37]:
from sklearn.cluster import DBSCAN

cols = ['cycle', 'datetime', 'capacity', 'voltage_measured', 'current_measured',
        'temperature_measured', 'current_load', 'voltage_load', 'time', 'flag']

std_scaler = StandardScaler().fit(fuel_cells_df_copy2)
std_df = std_scaler.transform(fuel_cells_df_copy2)

dbsc = DBSCAN(eps=0.5, min_samples=200).fit(std_df)

labels = dbsc.labels_
InΒ [38]:
out_df = fuel_cells_df_copy2.copy()
out_df['label'] = dbsc.labels_
InΒ [39]:
count_df = pd.DataFrame(data = out_df['label'].value_counts().values, columns = ['label_counts'])
count_df
Out[39]:
label_counts
0 41652
1 39197
2 37765
3 31177
4 8009
5 5394
6 5292
7 5209
8 4825
9 4202
10 1991
11 1008
InΒ [40]:
index_df = pd.DataFrame(out_df['label'].value_counts().index)
index_df
Out[40]:
label
0 7
1 1
2 4
3 9
4 -1
5 6
6 3
7 0
8 5
9 2
10 10
11 8
InΒ [41]:
label_counts_df = pd.concat([index_df,count_df], axis = 1)
label_counts_df
Out[41]:
label label_counts
0 7 41652
1 1 39197
2 4 37765
3 9 31177
4 -1 8009
5 6 5394
6 3 5292
7 0 5209
8 5 4825
9 2 4202
10 10 1991
11 8 1008
InΒ [42]:
out_df
Out[42]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag label
0 1 1207149941000000000 1.856487 4.191492 -0.004902 24.330034 -0.0006 0.000 0.000 1 -1
1 2 1207149941000000000 1.856487 4.190749 -0.001478 24.325993 -0.0006 4.206 16.781 1 -1
2 3 1207149941000000000 1.856487 3.974871 -2.012528 24.389085 -1.9982 3.062 35.703 1 0
3 4 1207149941000000000 1.856487 3.951717 -2.013979 24.544752 -1.9982 3.030 53.781 1 0
4 5 1207149941000000000 1.856487 3.934352 -2.011144 24.731385 -1.9982 3.011 71.922 1 0
... ... ... ... ... ... ... ... ... ... ... ...
185716 34862 1219221439000000000 1.341051 3.443760 -0.002426 35.383979 0.0006 0.000 2686.359 4 10
185717 34863 1219221439000000000 1.341051 3.453271 -0.000981 35.179732 0.0006 0.000 2700.546 4 10
185718 34864 1219221439000000000 1.341051 3.461963 0.000209 34.977000 0.0006 0.000 2714.640 4 10
185719 34865 1219221439000000000 1.341051 3.469907 0.001516 34.785943 0.0006 0.000 2728.750 4 10
185720 34866 1219221439000000000 1.341051 3.477277 -0.001940 34.581660 0.0006 0.000 2742.843 4 10

185721 rows Γ— 11 columns

InΒ [43]:
plt.figure(figsize=(15, 7))
sns.barplot(x = 'label', y = 'label_counts', data = label_counts_df)
plt.xlabel('Labels')
plt.ylabel('Count')
plt.show()
No description has been provided for this image
  • As we can see, when all features are considered, the datapoints corresponding to -1 (noisy data) are 8009
  • lets' further check with visualization and DBSCAN on other two features.
  • It's running with very high time complexity and hence we have visualised on 10k samples.
InΒ [44]:
plt.figure(figsize=(15, 7))
sns.scatterplot(x = 'time', y = 'temperature_measured', hue ='flag', data = out_df.sample(n = 10000, replace = True))
plt.xlabel('time of battery charging')
plt.ylabel('temperature_measured')
plt.show()
No description has been provided for this image
  • We can hardly see any outliers present when visualized with these two features.
  • Lets cross check with some other two features
InΒ [45]:
plt.figure(figsize=(15, 7))
sns.scatterplot(x = 'time', y = 'voltage_measured', hue ='flag', data = out_df.sample(n = 10000, replace = True))
plt.xlabel('time of battery charging')
plt.ylabel('voltage_measured')
plt.show()
No description has been provided for this image
InΒ [46]:
df = fuel_cells_df_copy2[['time','voltage_measured']].sample(n = 10000, replace = True)

std_scaler = StandardScaler().fit(df)
std_df = std_scaler.transform(df)

dbsc = DBSCAN(eps = .5, min_samples = 200).fit(std_df)

labels = dbsc.labels_
InΒ [47]:
out_df = df.copy()
out_df['label'] = dbsc.labels_

out_df['label'].value_counts()
Out[47]:
label
 0    9807
-1     193
Name: count, dtype: int64
InΒ [48]:
plt.figure(figsize=(15, 7))
plt.scatter(out_df['time'], out_df['voltage_measured'], c=out_df['label'])
plt.xlabel('time of battery charging')
plt.ylabel('voltage_measured')
plt.show()
No description has been provided for this image
  • With DBSCAN, we are clearly able to see the cluster for noise points in purple.
  • The total no. of outliers are 8009

MinMax Scaling

InΒ [49]:
cols = ['cycle', 'datetime', 'capacity', 'voltage_measured', 'current_measured',
       'temperature_measured', 'current_load', 'voltage_load', 'time', 'flag']
mm_scaler = MinMaxScaler()

fuel_cells_df[cols] = mm_scaler.fit_transform(fuel_cells_df[cols])
fuel_cells_df.head()
Out[49]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag
0 0.00000 0.0 0.797111 0.983242 0.990600 0.099077 0.49985 0.000000 0.000000 0.0
1 0.00002 0.0 0.797111 0.982944 0.992276 0.098875 0.49985 0.989880 0.004547 0.0
2 0.00004 0.0 0.797111 0.896465 0.008109 0.102032 0.00045 0.720640 0.009675 0.0
3 0.00006 0.0 0.797111 0.887189 0.007399 0.109822 0.00045 0.713109 0.014574 0.0
4 0.00008 0.0 0.797111 0.880233 0.008787 0.119162 0.00045 0.708637 0.019490 0.0

Isolation Forest based anomaly detectionΒΆ

InΒ [50]:
fuel_cells_df['flag'].unique()
Out[50]:
array([0.        , 0.33333333, 0.66666667, 1.        ])
InΒ [54]:
from sklearn.ensemble import IsolationForest

flag_df, flag_df_final = pd.DataFrame(), pd.DataFrame()
contamination_arr = [0.01, 0.02, 0.03, 0.04]

for j in fuel_cells_df['flag'].unique():
    flag_df = fuel_cells_df[fuel_cells_df['flag'] == j]
    for i in contamination_arr:
        model = IsolationForest(n_estimators = 200, contamination = i, random_state = 42)
        flag_df['anomaly_score_'+str(i)] = model.fit_predict(flag_df)
        flag_df['scores_' + str(i)]  = model.decision_function(flag_df[flag_df.columns.drop(['anomaly_score_'+str(i)])])
    flag_df_final = pd.concat([flag_df, flag_df_final])
0.01 0.0
0.02 0.0
0.03 0.0
0.04 0.0
0.01 0.3333333333333333
0.02 0.3333333333333333
0.03 0.3333333333333333
0.04 0.3333333333333333
0.01 0.6666666666666667
0.02 0.6666666666666667
0.03 0.6666666666666667
0.04 0.6666666666666667
0.01 1.0
0.02 1.0
0.03 1.0
0.04 1.0
InΒ [55]:
flag_df_final['anomaly_score_0.01'].value_counts(ascending= True)
Out[55]:
anomaly_score_0.01
-1      1858
 1    183863
Name: count, dtype: int64
InΒ [56]:
for i in contamination_arr:
    print(f'Anomalies with contamination {i}:',len(flag_df_final[flag_df_final['anomaly_score_'+str(i)] == -1]))
Anomalies with contamination 0.01: 1858
Anomalies with contamination 0.02: 3716
Anomalies with contamination 0.03: 5573
Anomalies with contamination 0.04: 7431
InΒ [57]:
fuel_cells_df_final = pd.concat([fuel_cells_df_copy2, flag_df_final.drop(columns=fuel_cells_df.columns)], axis = 1)
InΒ [58]:
df_final_copy = fuel_cells_df_final.copy()

Plotting the scores to visualize the scores given to the data

InΒ [59]:
plt.figure(figsize=(15, 7))
sns.histplot(x = 'scores_0.01',hue = 'flag', palette=['r','g','b','y'], data = df_final_copy, bins = 100, kde = True)
fig.show()
No description has been provided for this image
InΒ [60]:
plt.figure(figsize=(15, 7))
sns.histplot(x = 'scores_0.01',hue = 'anomaly_score_0.01', palette=['r','b'], data = df_final_copy, bins = 100, kde = True)
fig.show()
No description has been provided for this image
InΒ [61]:
df_final_copy[df_final_copy['anomaly_score_0.01']==-1]['flag'].value_counts()
Out[61]:
flag
1    503
2    503
3    503
4    349
Name: count, dtype: int64
InΒ [62]:
df_final_copy[df_final_copy['anomaly_score_0.04']==-1]['flag'].value_counts()
Out[62]:
flag
1    2012
2    2012
3    2012
4    1395
Name: count, dtype: int64
InΒ [63]:
data = df_final_copy[df_final_copy['anomaly_score_0.01']==-1]
InΒ [64]:
wd = df_final_copy.loc[(df_final_copy['anomaly_score_0.01'] == -1) & (df_final_copy['flag'] == 2)]
len(wd)
Out[64]:
503
InΒ [65]:
# Thresholding on scores ( < 0.04)

wd2 = df_final_copy.loc[(df_final_copy['anomaly_score_0.01'] == -1) & (df_final_copy['flag'] == 2) & (df_final_copy['scores_0.01']<-0.04)]
len(wd2)
Out[65]:
298
InΒ [66]:
# Thresholding on scores ( < 0.05)

wd3 = df_final_copy.loc[(df_final_copy['anomaly_score_0.01'] == -1) & (df_final_copy['flag'] == 2) & (df_final_copy['scores_0.01']<-0.05)]
len(wd3)
Out[66]:
205
InΒ [67]:
plt.figure(figsize=(15, 7))
plt.scatter(df_final_copy[(df_final_copy['anomaly_score_0.01'] == 1) & (df_final_copy['flag'] == 2)]['capacity'],
           df_final_copy[(df_final_copy['anomaly_score_0.01'] == 1) & (df_final_copy['flag'] == 2)]['temperature_measured'], c = 'blue', marker = "*")

plt.scatter(wd['capacity'], wd['temperature_measured'], c = 'orange', marker = 'o')
plt.scatter(wd2['capacity'], wd2['temperature_measured'], c = 'red', marker = 'o')
plt.scatter(wd3['capacity'], wd3['temperature_measured'], c = 'black', marker = 'o')

plt.xlabel('capacity')
plt.ylabel('temperature_measured')
plt.show()
No description has been provided for this image
  • Batteries with much higher capacities and much lower capacities but having minimum temperature measured are having most critical number of outliers
  • Whereas as the batteries with least temperature measured are having more number of outliers.
InΒ [68]:
plt.figure(figsize=(15, 7))
plt.scatter(df_final_copy[(df_final_copy['anomaly_score_0.01'] == 1) & (df_final_copy['flag'] == 2)]['cycle'],
           df_final_copy[(df_final_copy['anomaly_score_0.01'] == 1) & (df_final_copy['flag'] == 2)]['temperature_measured'], c = 'blue', marker = "*")

plt.scatter(wd['cycle'], wd['temperature_measured'], c = 'orange', marker = 'o')
plt.scatter(wd2['cycle'], wd2['temperature_measured'], c = 'red', marker = 'o')
plt.scatter(wd3['cycle'], wd3['temperature_measured'], c = 'black', marker = 'o')

plt.xlabel('cycle')
plt.ylabel('temperature_measured')
plt.show()
No description has been provided for this image
  • The batteries with least temperature measured are having more number of outliers.
InΒ [69]:
plt.figure(figsize=(15, 7))
plt.scatter(df_final_copy[(df_final_copy['anomaly_score_0.01'] == 1) & (df_final_copy['flag'] == 2)]['time'],
           df_final_copy[(df_final_copy['anomaly_score_0.01'] == 1) & (df_final_copy['flag'] == 2)]['voltage_measured'], c = 'blue', marker = "*")

plt.scatter(wd['time'], wd['voltage_measured'], c = 'orange', marker = 'o')
plt.scatter(wd2['time'], wd2['voltage_measured'], c = 'red', marker = 'o')
plt.scatter(wd3['time'], wd3['voltage_measured'], c = 'black', marker = 'o')

plt.xlabel('time')
plt.ylabel('voltage_measured')
plt.show()
No description has been provided for this image
  • Batteries datapoints which are producing maximum voltages with barely minimum charging time are ceratinly outliers/anomalies.
  • Also, with least contamination, the Batteries datapoints which requires maximum time to charge ceratinly outliers/anomalies.
InΒ [70]:
# Testing with multiple values such as with  anomaly_score_0.04

wd = df_final_copy.loc[(df_final_copy['anomaly_score_0.04'] == -1) & (df_final_copy['flag'] == 2)]
wd2 = df_final_copy.loc[(df_final_copy['anomaly_score_0.04'] == -1) & (df_final_copy['flag'] == 2) & (df_final_copy['scores_0.04']<-0.04)]
wd3 = df_final_copy.loc[(df_final_copy['anomaly_score_0.04'] == -1) & (df_final_copy['flag'] == 2) & (df_final_copy['scores_0.04']<-0.05)]
InΒ [71]:
plt.figure(figsize=(15, 7))
plt.scatter(df_final_copy[(df_final_copy['anomaly_score_0.04'] == 1) & (df_final_copy['flag'] == 2)]['time'],
           df_final_copy[(df_final_copy['anomaly_score_0.04'] == 1) & (df_final_copy['flag'] == 2)]['voltage_measured'], c = 'blue', marker = "*")

plt.scatter(wd['time'], wd['voltage_measured'], c = 'orange', marker = 'o')
plt.scatter(wd2['time'], wd2['voltage_measured'], c = 'red', marker = 'o')
plt.scatter(wd3['time'], wd3['voltage_measured'], c = 'black', marker = 'o')

plt.xlabel('time')
plt.ylabel('voltage_measured')
plt.show()
No description has been provided for this image
  • As it's clearly visible, if we increase the contamination % of outliers to 4% , it's able to classify the visually seen anomalies as outliers as compared to contamination of 1%
  • Also, with max contamination of 4%, the Batteries datapoints which requires maximum time to charge and whose voltages are low are certainly outliers/anomalies.
InΒ [72]:
flags_arr = [1,2,3,4]
contamination_arr = [0.01,0.02,0.03,0.04]

for i in flags_arr:
    for j in contamination_arr:
        wd = df_final_copy.loc[(df_final_copy['anomaly_score_'+str(j)] == -1) & (df_final_copy['flag'] == i)]
        wd2 = df_final_copy.loc[(df_final_copy['anomaly_score_'+str(j)] == -1) & (df_final_copy['flag'] == i) & (df_final_copy['scores'+'_'+str(j)]<-0.04)]
        wd3 = df_final_copy.loc[(df_final_copy['anomaly_score_'+str(j)] == -1) & (df_final_copy['flag'] == i) & (df_final_copy['scores'+'_'+str(j)]<-0.06)]
        
        plt.figure(figsize=(8, 6))
        plt.scatter(df_final_copy[(df_final_copy['anomaly_score_'+str(j)] == 1) & (df_final_copy['flag'] == i)]['capacity'],
                   df_final_copy[(df_final_copy['anomaly_score_'+str(j)] == 1) & (df_final_copy['flag'] == i)]['temperature_measured'], c = 'blue', marker = "*")

        plt.scatter(wd['capacity'], wd['temperature_measured'], c = 'orange', marker = 'o')
        plt.scatter(wd2['capacity'], wd2['temperature_measured'], c = 'red', marker = 'o')
        plt.scatter(wd3['capacity'], wd3['temperature_measured'], c = 'black', marker = 'o')

        plt.xlabel('capacity')
        plt.ylabel('temperature_measured')
        plt.title('Battery with flag ' + str(i)+ ' having contamination of '+str(j))
        plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
  • The batteries with least temperature measured are having more number of outliers.
  • Batteries with much higher capacities and much lower capacities but having minimum temperature measured are having most critical number of outliers
  • Whereas as the batteries with least temperature measured are having more number of outliers.
  • As we go on increase the contamination %, we are getting more number of datapoints classified as extreme outliers.

Local Outlier Factor (LOF) based anomaly detectionΒΆ

InΒ [73]:
lof_df = fuel_cells_df_copy2.copy()

n_neighbors = [5, 20]
contamination_arr = [0.01,0.04]
flags_arr = [1,2,3,4]
InΒ [74]:
from sklearn.neighbors import LocalOutlierFactor
for f in flags_arr:
    for n in n_neighbors:
        for c in contamination_arr:
            # model specification
            model1 = LocalOutlierFactor(n_neighbors = n, metric = "manhattan", contamination = c)
            # model fitting
            y_pred = model1.fit_predict(fuel_cells_df_copy2)
            # filter outlier index
            outlier_index = np.where(y_pred == -1) # negative values are outliers and positives inliers
            # filter outlier values
            outlier_values = fuel_cells_df_copy2.iloc[outlier_index]
            # plot data
            plt.scatter(fuel_cells_df_copy2["capacity"], fuel_cells_df_copy2["temperature_measured"], color = "b", s = 65)
            # plot outlier values
            plt.scatter(outlier_values["capacity"], outlier_values["temperature_measured"], color = "r") 
            plt.xlabel('capacity')
            plt.ylabel('temperature_measured')
            plt.title('Battery with flag ' + str(f)+ ' with n_neighbors = '+ str(n)+' having contamination of '+str(c))
            plt.show()
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
No description has been provided for this image
  • Almost 7429 datapoints are getting classified as outliers using maximum nearest neigbors (20) and with higesht contamination percentage of 4%.
  • With least nearest neigbors and least contamination percentage, we are getting less no. of outliers specifically in the mid-battery capacity range (1.4-1.6) having least and maximum temperature measured. This can be informed to subject matter expert and thus can be validated before removal.
InΒ [75]:
len(y_pred)
Out[75]:
185721
InΒ [76]:
y_pred_df = pd.DataFrame(data=y_pred,columns=['y_pred'])
fuel_cells_df_copy2_final_lof = pd.concat([fuel_cells_df_copy2,y_pred_df],axis = 1)
fuel_cells_df_copy2_final_lof
Out[76]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag y_pred
0 1 1207149941000000000 1.856487 4.191492 -0.004902 24.330034 -0.0006 0.000 0.000 1 -1
1 2 1207149941000000000 1.856487 4.190749 -0.001478 24.325993 -0.0006 4.206 16.781 1 -1
2 3 1207149941000000000 1.856487 3.974871 -2.012528 24.389085 -1.9982 3.062 35.703 1 -1
3 4 1207149941000000000 1.856487 3.951717 -2.013979 24.544752 -1.9982 3.030 53.781 1 -1
4 5 1207149941000000000 1.856487 3.934352 -2.011144 24.731385 -1.9982 3.011 71.922 1 1
... ... ... ... ... ... ... ... ... ... ... ...
185716 34862 1219221439000000000 1.341051 3.443760 -0.002426 35.383979 0.0006 0.000 2686.359 4 -1
185717 34863 1219221439000000000 1.341051 3.453271 -0.000981 35.179732 0.0006 0.000 2700.546 4 -1
185718 34864 1219221439000000000 1.341051 3.461963 0.000209 34.977000 0.0006 0.000 2714.640 4 -1
185719 34865 1219221439000000000 1.341051 3.469907 0.001516 34.785943 0.0006 0.000 2728.750 4 -1
185720 34866 1219221439000000000 1.341051 3.477277 -0.001940 34.581660 0.0006 0.000 2742.843 4 -1

185721 rows Γ— 11 columns

InΒ [77]:
# No. of outliers when we are considering the highest contamoination i.e 4 %

fuel_cells_df_copy2_final_lof[fuel_cells_df_copy2_final_lof['y_pred'] == -1]
Out[77]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag y_pred
0 1 1207149941000000000 1.856487 4.191492 -0.004902 24.330034 -0.0006 0.000 0.000 1 -1
1 2 1207149941000000000 1.856487 4.190749 -0.001478 24.325993 -0.0006 4.206 16.781 1 -1
2 3 1207149941000000000 1.856487 3.974871 -2.012528 24.389085 -1.9982 3.062 35.703 1 -1
3 4 1207149941000000000 1.856487 3.951717 -2.013979 24.544752 -1.9982 3.030 53.781 1 -1
193 194 1207149941000000000 1.856487 3.264121 -0.001532 35.021738 -0.0006 0.000 3628.953 1 -1
... ... ... ... ... ... ... ... ... ... ... ...
185716 34862 1219221439000000000 1.341051 3.443760 -0.002426 35.383979 0.0006 0.000 2686.359 4 -1
185717 34863 1219221439000000000 1.341051 3.453271 -0.000981 35.179732 0.0006 0.000 2700.546 4 -1
185718 34864 1219221439000000000 1.341051 3.461963 0.000209 34.977000 0.0006 0.000 2714.640 4 -1
185719 34865 1219221439000000000 1.341051 3.469907 0.001516 34.785943 0.0006 0.000 2728.750 4 -1
185720 34866 1219221439000000000 1.341051 3.477277 -0.001940 34.581660 0.0006 0.000 2742.843 4 -1

7429 rows Γ— 11 columns

InΒ [78]:
plt.figure(figsize=(15, 7))
plt.scatter(fuel_cells_df_copy2_final_lof['time'], fuel_cells_df_copy2_final_lof['voltage_measured'], c=fuel_cells_df_copy2_final_lof['y_pred'])
plt.xlabel('time of battery charging')
plt.ylabel('voltage_measured')
plt.show()
No description has been provided for this image
  • Validated visually in lower dimension with above hyperparameters but the performance is less acuurate than Isolation Forest but more accurate than other methods.

Elliptical Envelope based Anomaly DetectionΒΆ

InΒ [79]:
eliptical_df = fuel_cells_df_copy2.copy()
InΒ [80]:
from sklearn.preprocessing import StandardScaler
cols = ['cycle', 'datetime', 'capacity', 'voltage_measured', 'current_measured',
       'temperature_measured', 'current_load', 'voltage_load', 'time', 'flag']
scaler = StandardScaler()

eliptical_df[cols] = scaler.fit_transform(eliptical_df[cols])
eliptical_df.head()
Out[80]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag
0 -1.678666 -1.424022 1.477315 2.758442 3.255533 -1.998389 -1.194938 -3.149552 -1.705022 -1.281811
1 -1.678595 -1.424022 1.477315 2.755491 3.261632 -1.999392 -1.194938 2.448187 -1.686519 -1.281811
2 -1.678524 -1.424022 1.477315 1.897777 -0.320553 -1.983728 -2.823146 0.925645 -1.665656 -1.281811
3 -1.678453 -1.424022 1.477315 1.805782 -0.323138 -1.945079 -2.823146 0.883056 -1.645723 -1.281811
4 -1.678383 -1.424022 1.477315 1.736792 -0.318086 -1.898742 -2.823146 0.857769 -1.625721 -1.281811
InΒ [82]:
from sklearn.model_selection import train_test_split

X_train, X_test= train_test_split(eliptical_df, test_size=0.33, random_state=42)

data = X_train[cols].values
InΒ [83]:
from sklearn.covariance import EllipticEnvelope

model1 = EllipticEnvelope(contamination = 0.1) 
model1.fit(data)
Out[83]:
EllipticEnvelope()
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
EllipticEnvelope()
InΒ [84]:
# new data for prediction (data needs to be in arrays)
new_data = X_test[cols].values

# predict on new data 
pred1 = model1.predict(new_data)
print(pred1)
[1 1 1 ... 1 1 1]
InΒ [85]:
len(new_data), len(pred1)
Out[85]:
(61288, 61288)
InΒ [86]:
pred_df = pd.DataFrame(data=pred1,columns=['pred1'])
fuel_cells_df_copy2_final_eliptical = pd.concat([X_test,pred_df],axis = 1)

pred_df
Out[86]:
pred1
0 1
1 1
2 1
3 1
4 1
... ...
61283 1
61284 1
61285 1
61286 1
61287 1

61288 rows Γ— 1 columns

InΒ [87]:
fuel_cells_df_copy2_final_eliptical[fuel_cells_df_copy2_final_eliptical['pred1'] == -1].shape
Out[87]:
(5943, 11)
InΒ [88]:
all_data = eliptical_df[cols].values
pred2 = model1.predict(all_data)
print(pred2)
[ 1  1 -1 ... -1 -1 -1]
InΒ [89]:
len(pred2), len(all_data)
Out[89]:
(185721, 185721)
InΒ [90]:
pred_df_all = pd.DataFrame(data=pred2,columns=['pred2'])
final_eliptical = pd.concat([fuel_cells_df_copy2,pred_df_all],axis = 1)
final_eliptical
Out[90]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag pred2
0 1 1207149941000000000 1.856487 4.191492 -0.004902 24.330034 -0.0006 0.000 0.000 1 1
1 2 1207149941000000000 1.856487 4.190749 -0.001478 24.325993 -0.0006 4.206 16.781 1 1
2 3 1207149941000000000 1.856487 3.974871 -2.012528 24.389085 -1.9982 3.062 35.703 1 -1
3 4 1207149941000000000 1.856487 3.951717 -2.013979 24.544752 -1.9982 3.030 53.781 1 -1
4 5 1207149941000000000 1.856487 3.934352 -2.011144 24.731385 -1.9982 3.011 71.922 1 -1
... ... ... ... ... ... ... ... ... ... ... ...
185716 34862 1219221439000000000 1.341051 3.443760 -0.002426 35.383979 0.0006 0.000 2686.359 4 -1
185717 34863 1219221439000000000 1.341051 3.453271 -0.000981 35.179732 0.0006 0.000 2700.546 4 -1
185718 34864 1219221439000000000 1.341051 3.461963 0.000209 34.977000 0.0006 0.000 2714.640 4 -1
185719 34865 1219221439000000000 1.341051 3.469907 0.001516 34.785943 0.0006 0.000 2728.750 4 -1
185720 34866 1219221439000000000 1.341051 3.477277 -0.001940 34.581660 0.0006 0.000 2742.843 4 -1

185721 rows Γ— 11 columns

InΒ [91]:
final_eliptical[final_eliptical['pred2'] == -1]
Out[91]:
cycle datetime capacity voltage_measured current_measured temperature_measured current_load voltage_load time flag pred2
2 3 1207149941000000000 1.856487 3.974871 -2.012528 24.389085 -1.9982 3.062 35.703 1 -1
3 4 1207149941000000000 1.856487 3.951717 -2.013979 24.544752 -1.9982 3.030 53.781 1 -1
4 5 1207149941000000000 1.856487 3.934352 -2.011144 24.731385 -1.9982 3.011 71.922 1 -1
5 6 1207149941000000000 1.856487 3.920058 -2.013007 24.909816 -1.9982 2.991 90.094 1 -1
6 7 1207149941000000000 1.856487 3.907904 -2.014400 25.105884 -1.9982 2.977 108.281 1 -1
... ... ... ... ... ... ... ... ... ... ... ...
185716 34862 1219221439000000000 1.341051 3.443760 -0.002426 35.383979 0.0006 0.000 2686.359 4 -1
185717 34863 1219221439000000000 1.341051 3.453271 -0.000981 35.179732 0.0006 0.000 2700.546 4 -1
185718 34864 1219221439000000000 1.341051 3.461963 0.000209 34.977000 0.0006 0.000 2714.640 4 -1
185719 34865 1219221439000000000 1.341051 3.469907 0.001516 34.785943 0.0006 0.000 2728.750 4 -1
185720 34866 1219221439000000000 1.341051 3.477277 -0.001940 34.581660 0.0006 0.000 2742.843 4 -1

18387 rows Γ— 11 columns

InΒ [92]:
plt.figure(figsize=(15, 7))
plt.scatter(final_eliptical['time'], final_eliptical['voltage_measured'], c=final_eliptical['pred2'])
plt.xlabel('time of battery charging')
plt.ylabel('voltage_measured')
plt.show()
No description has been provided for this image

ComparisonΒΆ

Isolation Forest >> Local Outlier Factor (LOF) > DBSCAN (density based) > Elliptical Envelope >> IQR based